Bit Grooming: statistically accurate precision-preserving quantization with compression, evaluated in the netCDF Operators (NCO, v4.4.8+)

نویسنده

  • Charles S. Zender
چکیده

Geoscientific models and measurements generate false precision (scientifically meaningless data bits) that wastes storage space. False precision can mislead (by implying noise is signal) and be scientifically pointless, especially for measurements. By contrast, lossy compression can be both economical (save space) and heuristic (clarify data limitations) without compromising the scientific integrity of data. Data quantization can thus be appropriate regardless of whether space limitations are a concern. We introduce, implement, and characterize a new lossy compression scheme suitable for IEEE floating-point data. Our new Bit Grooming algorithm alternately shaves (to zero) and sets (to one) the least significant bits of consecutive values to preserve a desired precision. This is a symmetric, two-sided variant of an algorithm sometimes called Bit Shaving that quantizes values solely by zeroing bits. Our variation eliminates the artificial low bias produced by always zeroing bits, and makes Bit Grooming more suitable for arrays and multi-dimensional fields whose mean statistics are important. Bit Grooming relies on standard lossless compression to achieve the actual reduction in storage space, so we tested Bit Grooming by applying the DEFLATE compression algorithm to bit-groomed and full-precision climate data stored in netCDF3, netCDF4, HDF4, and HDF5 formats. Bit Grooming reduces the storage space required by initially uncompressed and compressed climate data by 25–80 and 5–65 %, respectively, for single-precision values (the most common case for climate data) quantized to retain 1–5 decimal digits of precision. The potential reduction is greater for doubleprecision datasets. When used aggressively (i.e., preserving only 1–2 digits), Bit Grooming produces storage reductions comparable to other quantization techniques such as Linear Packing. Unlike Linear Packing, whose guaranteed precision rapidly degrades within the relatively narrow dynamic range of values that it can compress, Bit Grooming guarantees the specified precision throughout the full floating-point range. Data quantization by Bit Grooming is irreversible (i.e., lossy) yet transparent, meaning that no extra processing is required by data users/readers. Hence Bit Grooming can easily reduce data storage volume without sacrificing scientific precision or imposing extra burdens on users.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analysis of self-describing gridded geoscience data with netCDF Operators (NCO)

The netCDF Operator (NCO) software facilitates manipulation and analysis of gridded 2 geoscience data stored in the self-describing netCDF format. NCO is optimized to efficiently 3 analyze large multi-dimensional datasets spanning many files. Researchers and data centers 4 often use NCO to analyze and serve observed and modeled geoscience data including satel5 lite observations and weather, air...

متن کامل

Color image compression using quantization , thresholding , and edge detection techniques all based on the moment - preserving principle 1

A new approach to color image compression with high compression ratios and good quality of reconstructed images using quantization, thresholding, and edge detection all based on the moment-preserving principle is proposed. An input image with 24 bits per pixel is quantized into 8 bits per pixel using a new color quantization method based on the moment-preserving principle. The quantized image i...

متن کامل

Scaling Properties of Common Statistical Operators for Gridded Datasets

An accurate cost-model that accounts for dataset size and structure can help optimize geoscience data analysis. We develop and apply a computational model to estimate data analysis costs for arithmetic operations on gridded datasets typical of satelliteor climate model-origin. For these dataset geometries our model predicts data reduction scalings that agree with measurements of widely-used geo...

متن کامل

Color image compression using quantization, thresholding, and edge detection techniques all based on the moment-preserving principle

A new approach to color image compression with high compression ratios and good quality of reconstructed images using quantization, thresholding, and edge detection all based on the moment-preserving principle is proposed. An input image with 24 bits per pixel is quantized into 8 bits per pixel using a new color quantization method based on the moment-preserving principle. The quantized image i...

متن کامل

The Effects of Data Compression on SAR Change Detection

The performance of coherent and non-coherent change detection algorithms is evaluated using complex SAR data that have been processed with various data compression approaches; the hope is that it may be possible to achieve higher compression ratios than could be achieved using classical image compression approaches such as BAQ (block adaptive quantization). BAQ compression is typically applied ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016